grants_finalgrant_code; Unique identifier code for each
grant.
year; The year the grant application was submitted.
From 1994 to 2016.
area_name; Research area to which the grant
application is related, such as “Math & Computer Science”
and seven other areas.
project_title_ru; Title of the grant project in
Russian.
status; Status of the grant application
(“accepted” or “rejected”).
project_type_en; Type of competition in English
(e.g., “initiative research projects”, “projects of young
researchers”). This variable is not yet final, but it has the
potential to create several categories of competition that will differ
in terms of resources and requirements for the PI. To work effectively
with this variable, a specialist who understands the differences between
these types of competition is required. We have access to such a
specialist.
project_type_ru_raw; Full spelling of Type of
competition in Russian. This variable was used to create
project_type_en using a rough initial cleanup and
classification.
gender; Gender of the principal investigator
(“male”, “female”, “unknown”).
family_name_pi_ru; Family name of the principal
investigator in Russian. This variable allows us to specify
gender. (Male Suffixes: -ов, -ий, -ин, -ев, -ый;
Female Suffixes: -ва, -ая, -на)
abstact_ru_raw; Raw abstract of the project in
Russian.
title_length; Length of the project title (number of
characters).
abst_length; Length of the project abstract (number
of characters), where available.
abst_have; Indicates whether an abstract is
available for the project (“have”, “no_abs”).
The dataset grants_final consists of 304173
rows.
There are 8 scientific fields, and approximately ~4600
observations lack an area_name.
grants_final
Dataset| Overall (N=304173) |
|
|---|---|
| area_name | |
| Biology & Medical Sciences | 64356 (21.2%) |
| Chemistry & Material Sciences | 42893 (14.1%) |
| Earth Sciences | 42968 (14.1%) |
| Engineering | 25334 (8.3%) |
| Humanities & Social Sciences | 17472 (5.7%) |
| IT | 17710 (5.8%) |
| Math & Сomputer Science | 31643 (10.4%) |
| Physics & Astronomy | 57198 (18.8%) |
| unknown | 4599 (1.5%) |
| status | |
| accepted | 98825 (32.5%) |
| rejected | 205348 (67.5%) |
| title_length | |
| Mean (SD) | 116 (48.7) |
| Median [Min, Max] | 109 [11.0, 953] |
| abst_have | |
| have | 85828 (28.2%) |
| no_abs | 218345 (71.8%) |
| abst_length | |
| Mean (SD) | 1660 (750) |
| Median [Min, Max] | 1570 [21.0, 7690] |
| Missing | 218345 (71.8%) |
yearyeartitle_lengthgrants_final
which contains 304173. We discard about 88,000 values of
project_title_ru that do not have a proper full title. Also
discarded are grants for participation in conferences, conducting
events, and those without detailed specifications about the content and
specific topics. The complete list of discarded
project_title_ru is presented in Table 2.